Skip to content

Conversation

FeiDaLI
Copy link
Contributor

@FeiDaLI FeiDaLI commented Oct 5, 2025

What type of PR is this?
feat(llm-katan-server): add lightweight real LLM backend with extended context length. The server forwards OpenAI-compatible requests to a running llm-katan instance and returns responses.

What this PR does / why we need it:

This PR introduces a new LLM Katan Server that serves as a lightweight, real LLM backend alternative to mock-vllm.

  1. Real LLM Backend: Created a FastAPI wrapper around llm-katan that provides actual LLM instead of mock responses
  2. Extended Context Length: Removed the 512 token limit and increased it to 100k tokens to prevent prompt truncation that could lead to information loss
  3. OpenAI-Compatible API: Maintains the same API design as mock-vllm for seamless integration
  4. Updated Documentation: add setup instructions and usage examples

Copy link

netlify bot commented Oct 5, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit e034d8e
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68e2640837818900089b6901
😎 Deploy Preview https://deploy-preview-348--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link

github-actions bot commented Oct 5, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 tools

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

  • tools/llm-katan-server/Dockerfile
  • tools/llm-katan-server/README.md
  • tools/llm-katan-server/app.py
  • tools/llm-katan-server/requirements.txt

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • .pre-commit-config.yaml

📁 candle-binding

Owners: @rootfs
Files changed:

  • candle-binding/src/lib.rs

📁 e2e-tests

Owners: @yossiovadia
Files changed:

  • e2e-tests/06-pii-detection-test.py

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/training/classifier_model_fine_tuning/ft_linear.py
  • src/training/dual_classifier/dual_classifier.py
  • src/training/dual_classifier/trainer.py
  • src/training/prompt_guard_fine_tuning/jailbreak_bert_finetuning.py
  • src/training/training_lora/classifier_model_fine_tuning_lora/ft_linear_lora.py
  • src/training/training_lora/pii_model_fine_tuning_lora/pii_bert_finetuning_lora.py
  • src/training/training_lora/prompt_guard_fine_tuning_lora/jailbreak_bert_finetuning_lora.py

📁 website

Owners: @Xunzhuo
Files changed:

  • website/docs/installation/installation.md

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

tokenizer
.with_truncation(Some(TruncationParams {
max_length: max_length.unwrap_or(512),
max_length: max_length.unwrap_or(100000),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes to the candle binding go to feat-candle-refactoring

@FeiDaLI FeiDaLI closed this Oct 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants